-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Optimize count operation for iceberg #22923
Conversation
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
@@ -132,10 +134,29 @@ Status IcebergTableReader::init_reader( | |||
_all_required_col_names, _not_in_file_col_names, &_new_colname_to_value_range, | |||
conjuncts, tuple_descriptor, row_descriptor, colname_to_slot_id, | |||
not_single_slot_filter_conjuncts, slot_id_to_filter_conjuncts); | |||
_batch_size = parquet_reader->get_batch_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch size is in state->query_options().batch_size
(From new machine)TeamCity pipeline, clickbench performance test result: |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
(From new machine)TeamCity pipeline, clickbench performance test result: |
} | ||
|
||
private long getCountFromSnapshot() { | ||
Snapshot snapshot = icebergTable.currentSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time to call this getCountFromSnapshot
, it will return the current snapshot
of this table. And different call may get different snapshots.
BTW, we support time travel query of iceberg, how to handle it, eg:
select from iceberg of timestamp xxxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
Proposed changes
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...